Multiscale simulations reveal the role of PcrA helicase in protecting against spontaneous point mutations in DNA

Proton transfer across hydrogen bonds in DNA can produce non-canonical nucleobase dimers and is a possible source of single-point mutations when these forms mismatch under replication. Previous computational studies have revealed this process to be energetically feasible for the guanine-cytosine (GC) base pair, but the tautomeric product (G\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^*$$\end{document}∗C\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^*$$\end{document}∗) is short-lived. In this work we reveal, for the first time, the direct effect of the replisome enzymes on proton transfer, rectifying the shortcomings of existing models. Multi-scale quantum mechanical/molecular dynamics (QM/MM) simulations reveal the effect of the bacterial PcrA Helicase on the double proton transfer in the GC base pair. It is shown that the local protein environment drastically increases the activation and reaction energies for the double proton transfer, modifying the tautomeric equilibrium. We propose a regime in which the proton transfer is dominated by tunnelling, taking place instantaneously and without atomic rearrangement of the local environment. In this paradigm, we can reconcile the metastable nature of the tautomer and show that ensemble averaging methods obscure detail in the reaction profile. Our results highlight the importance of explicit environmental models and suggest that asparagine N624 serves a secondary function of reducing spontaneous mutations in PcrA Helicase.


Molecular Dynamics Equilibration
All energy minimisation, molecular mechanics, and molecular dynamics were performed using Gromacs 2018 (5), with the March 2019 version of the CHARMM36 force field, and SPCE water.
Prior to any dynamics energy minimisation was performed on both these systems using a steepest descent algorithm until the maximum force did not exceed 12 kJ/mol/nm.The molecular dynamics parameters for the geometry optimisation can be found in gromacs_parameters/minim_12.mdp.
Each of these systems was placed simulated in an NVT ensemble using a leap-frog integrator, timestep of 1 fs, and Nose-Hoover temperature coupling with time constant 0.2 ps and reference temperature 310 K, and three-dimensional periodic boundary conditions for a total of 3 ns.Within this time the systems were verified to have been equilibrated as the temperatures and RMSD stabilised.An example molecular dynamics parameter file for the molecular dynamics equilibration can be found in gromacs_parameters/eq500ps_2CX.mdp.

Hybrid Quantum Mechanics / Molecular Mechanics
Hybrid QM/MM calculations were performed with the Gromacs-CP2K distribution (5,9).The nucleobases of the relevant GC base pair were defined as the QM region, whose electronic structure would be solved CP2K(9) using density functional theory with the BLYP exchange correlation functional, DZVP-MOLOPT-GTH basis set, GTH-BLYP potential, electrostatic (point charge) embedding, and the VDW3 dispersion correction.For QM/MM dynamics, Gromacs applied the leap-frog algorithm, with forces obtained from DFT in CP2K for the QM region and CHARMM36 elsewhere.

Umbrella Sampling
Umbrella Sampling was performed with Gromacs-CP2K at a CHARMM36/DFT/BLYP level of theory using four distance reaction coordinates describing the double proton transfer as illustrated in Figure S1.Each umbrella sampling window applied a 20000 kJ/mol/nm 2 harmonic potential along the four reaction coordinates, and was simulated for at least 8ps per replica per window.Depending on the level of equilibration, up to the first 2 ps of each simulation were discarded.An example for the steered molecular dynamics parameter input file for Gromacs can be found in gromacs_parameters/umb_8ps_f20k_2CM_BLYP_notraj.mdp, and a corresponding example CP2K input file in cp2k_parameters/HS41R1_BLYP_VDW_w008.inp.
The double proton transfer pathways that were sampled are illustrated in Figure S2.The concerted/synchronous DPT is a simple linear interpolation between the canonical GC and the G * C * tautomer.The asynchronous pathway was taken from the equilibrium distance GC dimer ML-NEB pathways reported by Slocombe et al. (10) The potentials of mean force (PMF) along the sampled reaction were obtained by the Weighted Histogram Analysis Method (WHAM) implemented in Gromacs with a tolerance of 1.0 × 10 −6 and statistical variance was estimated with 100 bootstrapping samples.The four reaction coordinates were projected onto one dimension in an average relative change from the canonical minimum, which is the default for Gromacs' WHAM.The reaction coordinate z is thus defined as: Where RC1 0 is the canonical reference point for RC1, etc.The reaction asymmetries obtained from PMFs were used to calculate the equilibrium constants (K) using Equation 2.Where ∆G is a reaction free energy, R is the gas constant, and T is temperature (310K in this work).

Instantaneous Transfer Surface
To determine the instantaneous double proton transfer potential energy surface, snapshots were taken from the QM/MM umbrella sampling simulations corresponding to the canonical GC. 256 replica structures were created using Python and MolParse(6) mapping the two dimensional transfer landscape of the two protons.Each proton had its position interpolated between 0.9 and 2.5 Angstrom distance from its covalently bonded donor atom.The python code in analysis_code/grid.pywas used to generate the array of input structures.
For each of these grid structures a single point energy (SPE) was calculated using the previously described electrostatically embedded QM/MM in CP2K with two levels of theory: identical to the umbrella sampling methodology with BLYP/DZVP-MOLOPT-GTH and B3LYP/DZVP-MOLOPT-GTH which also utilised the auxillary density matrix method using the cFIT3 basis.For each SPE calculation a new CP2K input was constructed using the following parameter templates: BLYP cp2k_parameters/template_BLYP.inp and B3LYP cp2k_parameters/template_B3LYP_new.inp.
The grid of SPEs was interpolated using the CloughTocher piecewise cubic, C1 smooth, curvature-minimizing interpolant in two-dimensions inplemented in SciPy.(11) Within this surface local minima were determined using the BFGS optimiser implemented in ASE, (12) and the minimum energy path connecting the canonical and tautomeric states was obtained using the nudged elastic band (NEB) algorithm also implemented in ASE.All of this functionality was achieved the analysis code in analysis_code/plot_grid.py

G * C * Ensemble decay
To investigate the metastable nature of the G * C * tautomer, post DPT product structures were generated from each instantaneous PES, and these were placed in unbiased 310K NVT QM/MM MD simulations, the parameters for which can be found in gromacs_parameters/md5ps_qmmm_vel.mdp.The velocities were taken from the original US snapshot used to generate the surface.
To quantitatively compare the decays the time at which the system leaves the local potential energy minima has been estimated by considering the point where the two reaction coordinates cross over.For example, in the DC:N4-DC:H41 -DG:O6 transfer described by RC1 and RC3 (see Subsection 1.4) we consider the proton to be in the canonical well as long as RC1 < RC3.Additionally once the vector between the canonical minima and [RC1,RC3] is within 0.1 Angstrom, we consider the system to have 'arrived' at the canonical GC state.These lifetime results are shown in Figure S3.

Instantaneous PES at non-equilibrium separation
To obtain a strand separation trajectory in the DNA-Helicase complex a short (5ps) steered molecular dynamics simulation was performed on an equilibrated structure.A constant separating force (500 kJ/mol/nm 2 ) was applied to the DC:N1 and DG:N9 backbone atoms forcing them directly apart.See gromacs_parameters/pull_v1.mdp for exact simulation parameters.

Mutations of the PcrA Helicase sequence
Sequences for PcrA Helicase from 59 species were analysed around site 624 which is an asparagine (N624) in 51% of cases, other minor populations of 624 include arginine (R624, 24%), glutamine (Q624, 14%), and tyrosine (Y624, 12%).Table S4 shows the statistics of mutations in the sites 622 to 626.When considering vicinal residues to N624 multiple motifs appear.Motifs with N624 remain the most common, with FGNIQ, and FGNIH making up the 51% majority.When considering sites 622 to 626, a tyrosine at 624 (Y624) is always part of a FGYTN motif (12% frequency), while sequences containing glutamine Q624 and arginine R624 are more diverse, i.e. the vicinal positions 622, 623, 625, and 626 are not strongly correlated to the amino acid in 624.

Additional Molecular Dynamics trajectories for aqueous DNA and the PcrA Helicase-DNA complex
In addition to the 3 ns of equilibration time, three systems were simulated across a further 30 ns each to observe slower dynamics.The same Molecular Dynamics (MD) parameters were used as described in Subsection 1.2.

Additional Simulation Details
This section provides further details regarding the validation of the methods described in Section 1.

QM/MM level of theory
The exchange correlation functional BLYP with and without VDW3 dispersion corrections was compared to PBE and the semi-empirical PM6 for the concerted DPT in aqueous DNA, and the PMFs are shown in Figure S4.PM6 provided the greatest statistical uncertainty between all tested levels of theory and was thus not a suitable description of this reaction.Between BLYP, PBE, and BLYP+VdW3 there are small differences in the produced concerted double proton transfer barriers.BLYP+VdW3 was chosen for the umbrella sampling due to its faster sampling convergence.See Subsection 2.2.

Sampling convergence
To ensure that the potentials of mean force produced from the umbrella sampling simulations have converged the RMS difference between the reaction profiles was calculated for a given sampling time.For the chosen level of theory (BLYP VDW) a convergence is reached within 500ps of umbrella sampling, see Figure S5.Table S2 shows the total sampling time for each umbrella sampling PMF obtained in this work.Despite the computational expense of QM/MM umbrella sampling, we have ensured that the asynchronous pathway PMF for each studied system has converged to above 95%.

GC dimer separation during umbrella sampling simulations
During the umbrella sampling reactions the average donor-acceptor distances of the hydrogen bonds involved in proton transfer varied significantly and are reported in Figure S6.Mathematically the separation can be described as (RC1 + RC2 + RC3 + RC4)/4.We report a compression of the dimer during the sampled reaction, as previously observed by Slocombe et al. in ( 13) and (10).For the concerted double proton transfer a single compression is observed, corresponding to the single transition state, except the Helicase N-1 case which has a compressed tautomeric state.For the asynchronous pathway we observe two compression events corresponding to the two transition states resisting each proton transfer, again, the N-1 case was an outlier and did not exhibit this compression.This compression shows the high energetic cost of the proton transfer is alleviated by molecular degrees of freedom, not directly involved in the proton transfer.This motivates the need for a description of the proton transfer without such atomic rearrangement, especially if the proton transfer is known to occur near-instantaneously due to quantum tunnelling.

Additional Simulation Results
This section provides additional results to support discussions in the main text.

Umbrella sampling results for the synchronous DPT
While the asynchronous double proton transfer (DPT) was found to be energetically preferred with both umbrella sampling (US) and the instantaneous minimum energy path (MEP), we also report the synchronous proton transfer in Table 1 of the main article.The corresponding potentials of mean force (PMF's) are shown in S7.

Conformational behaviour of aqueous DNA and the PcrA-Helicase in longer molecular dynamics trajectories
Aqueous DNA, the wild type PcrA Helicase-DNA complex (N624), and the point mutation N624A were simulated in longer 33ns NVT MD (see Subsection 1.9).For Aqueous DNA, an overview of the dynamics is shown in Figure S15, which shows significant fraying at the ends of the duplex, while more stability is maintained in the middle.Figure S18 shows the donor-acceptor distance for the two hydrogen bonds involved in the DPT from these extended MD trajectories.The distribution for both bonds -DGO6-DCN4 (proton 1's donor and acceptor, and sum of RC1 and RC2) and DGN1-DCN3 (proton 2's donor and acceptor, and sum of RC3 and RC4) -centre on a donor-acceptor distance of 3 Å.
For the wild-type PcrA Helicase-DNA complex with asparagine N624, the conformations present in the extended MD are summarised in Figure S16, in which panel A shows significant fluctuations in the single-stranded DNA not passing through the enzyme, but relative stability across the rest of the complex.For base pair N, Panel B shows the conformation used for umbrella sampling and corresponds to the 3 Å peak of the donor-acceptor distance distribution shown in Figure S18.Panel C shows the a common alternative conformation corresponding to the more separated peak around 5.7 Å.While the GC base pair N-1 did open up slightly further than the aqueous DNA reference (deduced from the longer tail in the donor-acceptor distance distribution), the distribution still peaks at the equilibrium value (3 Å).
For the PcrA-DNA complex with the N624A point mutation, the smaller sidechain of alanine compared to asparagine has allows for increased mobility of the single-stranded DNA and a different alternative conformation is formed (see panel C of Figure S17).In this alternative conformation, DG662 and DC701 (base pair N) are no longer the last in the duplex as the DT663-DA702 (pair N+1) can reform.It is also observed that the alternative conformation does not stretch the base pair N (see the donor-acceptor distance in Figure S18), as most of the distribution is still centred around 3 Å.However, likely due to the increased mobility of the ssDNA, base pair N is opens more frequently during the simulations than in aqueous DNA, but not as often as in complex with wild-type PcrA.

Instantaneous DPT transfer surfaces
The individual instantaneous DPT transfer surfaces described in the main text are shown in Figures S9, S11, and S13 for duplex DNA, wild type PcrA Helicase (base pair N), and for helicase with the N624A mutation, respectively.The minimum energy paths through the surfaces are shown in Figures S10, S12, and S14, for the same three systems.

Fig. S1. Reaction coordinates used to map the double proton transfer
From these figures we can determine that the instantaneous PES and MEP through it is highly sensitive to conformational differences between snapshots.The locations and energies of the PES minimas are summarised in Table S3.The locations of the non-canonical minima vary between replicas of the same system, as the length of the hydrogen bonds and separation of the GC dimer fluctuate during the MD and US (see 2.3).Similarly, conformational differences lead to large energetic differences between the non-canonical minima between replicas, by as much as 28% relative standard deviation.Table S3.Properties of energetic minima in instantaneous potential energy surfaces.RC1 and RC3 are the donor-hydrogen distance for each hydrogen bond involved in the DPT (see Figure S1), and ∆E is the potential energy relative to the canonical (GC) minimum.Fig. S9.Instantaneous DPT PES for duplex DNA.Contoured heatmap illustrating the DPT's instantaneous energy surface, the minimum energy pathway within this landscape (white line with circular dots), and the decay path from G * C * to GC (black line with multicoloured circles).For the decay path, the dots are coloured according to the simulation time, with blue corresponding to the start of the simulation and green at the end of the decay.B3LYP_2CM_grid_new B3LYP_2CM_HL01 Fig. S11.Instantaneous DPT PES for the PcrA Helicase-DNA complex (base pair N).Contoured heatmap illustrating the DPT's instantaneous energy surface, the minimum energy pathway within this landscape (white line with circular dots), and the decay path from G * C * to GC (black line with multicoloured circles).For the decay path, the dots are coloured according to the simulation time, with blue corresponding to the start of the simulation and green at the end of the decay.

Fig. S7 .
Fig. S7.Potentials of mean force (PMF) for the synchronous double proton transfer in guanine-cytosine obtained from QM/MM Umbrella Sampling (US) simulations for Aqueous Duplex DNA (A) and two base pairs in the wild type PcrA Helicase-DNA Complex, pair N-1 (B) and pair N (C).

Fig. S8 .
Fig. S8.Archetypical conformations of the PcrA Helicase-DNA duplex with the wild type sequence and N624A point mutation.The base pair N GC dimer and amino acid in site 624 are shown in CPK representation.

Fig. S10 .
Fig. S10.Minimum energy paths through the instantaneous DPT PES for duplex DNA.

Fig. S12 .
Fig. S12.Minimum energy paths through the instantaneous DPT PES for the PcrA Helicase-DNA complex (base pair N).

Table S4 . Amino acid residues in sites 622-626 across 59 species of PcrA Helicase. Asterisks (*) are used as a wildcard denoting any amino acid. N is the number of occurences of the given motif, also given as a percentage in parentheses.
Instantaneous DPT PES for the PcrA Helicase-DNA complex (base pair N) with the N624A point mutation.Contoured heatmap illustrating the DPT's instantaneous energy surface, the minimum energy pathway within this landscape (white line with circular dots).